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Abstract 

We introduce a general purpose algorithm for rapidly computing certain types of 
oscillatory integrals which frequently arise in problems connected to wave propagation 
and general hyperbolic equations. The problem is to evaluate numerically a so-called 
Fourier integral operator (FIO) of the form J e 2nl ®( x '^a(x, £) /(£)d£ at points given on 
a Cartesian grid. Here, £ is a frequency variable, /(£) is the Fourier transform of the 
input /, a{x, £) is an amplitude and is a phase function, which is typically as 

large as |£|; hence the integral is highly oscillatory at high frequencies. Because an FIO 
is a dense matrix, a naive matrix vector product with an input given on a Cartesian 
grid of size N by TV would require 0(7V 4 ) operations. 

This paper develops a new numerical algorithm which requires 0(N log TV) opera- 
tions, and as low as 0(y/~N) in storage space. It operates by localizing the integral over 
polar wedges with small angular aperture in the frequency plane. On each wedge, the 
algorithm factorizes the kernel e 2wi ^ x '^'a(x, £) into two components: 1) a diffeomor- 
phism which is handled by means of a nonuniform FFT and 2) a residual factor which 
is handled by numerical separation of the spatial and frequency variables. The key to 
the complexity and accuracy estimates is that the separation rank of the residual kernel 
is provably independent of the problem size. Several numerical examples demonstrate 
the efficiency and accuracy of the proposed methodology. We also discuss the potential 
of our ideas for various applications such as reflection seismology. 
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1 Introduction 

This paper introduces a general-purpose algorithm to compute the action of linear operators 
which are frequently encountered in analysis and scientific computing. These operators take 
the form 



(Lf)(x)=f a(x,0e 2 ™ Hx '°f(0dt, (1.1) 



where is a smooth phase function obeying the homogeneity relation $(x,A^) = 

A3>(x,£) for A positive, and a(x,£) is a smooth amplitude term. As is standard, / is the 
Fourier transform of / defined by 

f(0= [ f(x)e~ 2 ^dx. (1.2) 

With the proper regularity assumptions on the phase and amplitude to be detailed later, 
(|1.1|) defines a class of oscillatory integrals known as Fourier integral operators (FIOs). FIOs 
are the subject of considerable study for many of the operators encountered in physics and 
other fields are of this form. For instance, most differential and pseudodifferential operators 
are FIOs. Convolutions and multiplications by smooth functions are FIOs. Some "principal 
value" integrals are FIOs. And the list goes on. 

An especially important example of FIO is the solution operator to the free-space wave 
equation in R d , d > 1, 

^r{x,t) =c 2 Au{x,t), (1.3) 

with initial conditions u(x,0) = uq{x) and ^f(x,0) = 0, say. Everyone knows that for 
constant speeds, the Fourier transform decouples the different frequency components of 
u. Each Fourier component obeys an ordinary differential equation which can be solved 
explicitly. The solution u(x,t) is the superposition of these Fourier modes and is given by 

u(x,t) = \ [j e^<^u (m + / e^<-^n (m) • (1.4) 

The connection is now clear: the solution operator is the sum of two Fourier integral 
operators with phase functions 

$±(x,0 = x-£±c|£|t. 

For variable but reasonably smooth sound speeds c(x), the solution operator is for small 
times a sum of two FIOs with more complicated phases and amplitudes. In particular, the 
phase can be constructed from the optical traveltime in a medium with index of refraction 
l/c(x), see for details. 

In short, it is useful to think of FIOs as proxies for the solution operator to large classes 
of hyperbolic differential equations. 

1.1 FIO computations 

Numerical simulation of free wave propagation with constant sound speed is straightforward. 
As long as the solution u(x, t) is sufficiently well localized both in space and frequency, it 
can be computed accurately and rapidly by applying the sequence of steps below. 

1. Compute the Fast Fourier Transform (FFT) of uq. 

2. Multiply the result by e ± 2 ^ ic l^ I* 5 anc l sum as i n 

3. Compute the inverse FFT. 

Of course, this only works in the very special case where the amplitude a is independent of x, 
and where the phase is of the form x-£ plus a function of £ alone. Expressed differently, this 
works when the FIO is shift-invariant so that it is diagonal in the Fourier basis. Note that 
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there is in general no formula for the eigenfunctions when $ or a depend on x. Computing 
these eigenfunctions on the fly is out of the question when the objective is merely to compute 
the action of the operator. (Note that even if the spectral decomposition of the operator 
were available, it is not clear how one would use it to speed up computations.) 

The object of this paper is to find an algorithm that is considerably faster than eval- 
uating by direct quadratures, and is yet suited to handle large classes of phases 
and amplitudes. Most of the existing fast summation techniques rely on either the non- 
oscillatory behavior (such as wavelet based techniques [5]) or the existence of a low rank 
approximation (fast multipole methods [2111 ; hierarchical matrices j2^> pseudodifferential 
separation [1]). The difficulty here is that the kernel e 2m $(x£) j g highly oscillatory and does 
not have a low rank separated approximation. Therefore, all the modern techniques are 
not directly applicable. 

The main claim of this paper, however, is that there is a way to decompose the operator 
into a sum of components for which the oscillations are well-understood and low-rank 
representations are available. In addition, the number of such components is reasonably 
small which paves the way to faster algorithms. Before expanding on this idea, we first 
explain the discretization of the operator ljl.ll) . 

1.2 Discretization 

For simplicity, we restrict our attention in this paper to the two dimensional case d = 2. 
The situation in which d > 3 is exactly the same. 

Just as the discrete Fourier transform is the digital analogue of the continuous Fourier 
transform, one can also introduce discrete Fourier integral operators. Given a function / 
defined on a Cartesian grid X = {x = ^),0 < ni,ri2 < N and ni,rt2 G Z}, we simply 
define the discrete Fourier integral operator by 

(Lf)(x) := l^a(x,Oe 2 ^^/(0 (1.5) 
ten 

for every x G X. (We are sorry for overloading the symbol L to denote both the discrete 
and continuous object but there will be no confusion in the sequel.) The summation above 
is taken over all $1 = {£ = (rai,^), — % < ni,n2 < y and n\,n2 G Z} and throughout this 
paper, we will assume that N is an even integer. Here and below, / is the discrete Fourier 
transform (DFT) of / and is defined as 

fa) 4E *-™*m- (i-6) 

The normalizing constant i in ()1.5|) (resp. Q1-6J1 ) ensures that L (resp. the DFT) is a 
discrete isometry in the case where <&(x, £) = x ■ £. 

The formula (|1.5|) turns out to be an accurate discretization of as soon as / obeys 
standard localization estimates both in space and frequency. A justification of this fact 
would however go beyond the scope of this paper, and is omitted. In the remainder of the 
paper, we will take (|1.5|) as the quantity we wish to compute once we are given a phase and 
an amplitude function. 

The parameter N measures the size and difficulty of the computational problem. In 
a nutshell, it corresponds to the number of points which are needed in each direction to 
accurately sample the continuous object f(x). This is the reason why N will be a central 
quantity throughout the rest of paper. 
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As mentioned earlier, the straightforward method for computing (|1.5|) simply evaluates 
the summation independently for each x. Since each sum takes 0(N 2 ) operations and there 
are N 2 grid points in X, this strategy requires 0(N i ) operations. When N is moderately 
large, this can be prohibitive. This paper describes a novel algorithm which computes all 
the values of Lf(x) for x G X with high accuracy in 0(N 2 ' 5 log N) operations. The only 
requirement is that the amplitude and the phase obey mild smoothness conditions, which 
are in fact standard. 

1.3 Separation within angular wedges 

This section outlines the main idea of the paper. Let arg£ be the angle between £ and the 
horizontal vector (1,0), and partition the frequency domain into a family of angular wedges 
{We} defined by 

W e = {£ : (21 - l)n/VN < arg£ < (21 + 1)tt/^/N} 

for < t < (assume is an integer). An important property of these wedges is that 
each We satisfies the parabolic relationship 

length ~ width 2 , (1-7) 

up to multiplicative constants independent of N. There are 0(\/~N) such wedges, as illus- 
trated in Figure ^ 

For each wedge We, we let xe be the indicator function of We- Similarly, we denote by 
£e the unit vector pointing to the center direction of We 




It follows from the identify Yle Xi(0 = 1 that one can decompose the operator L as Yle -^i 
where 

i 

Within each wedge We, we can perform a Taylor expansion of <3?(x, £) in the second variable, 
around the point There is a point which belongs to the line segment [&|£|,£] such 

that 

*(x,o = + v € $(s,&i£i) • (e - + \(e - £e\s\) T vtt*(.x,?)(e - 

By homogeneity of the phase ($(x, A£) = \$>(x,£) for A > 0), it holds that <£(x,£) = 

£ • Vf$(a;, £) and V^(x, £) = \/^(x, £). The first and third terms in the above expression 
cancel and thus 

= v^(x,ie) ■ e + - l£iei) T v^$(x,r)(r - 

The first term V^$(x,^) • £, which is linear in £, is called £/ie linearized phase and poses 
no problem as we will see later on. The rest, denoted as = $(&,£) — Vg$(x,^) • £ 

and called i/ie residual phase, is of order O(l) for £ E Vl 7 ^, independently of N. This follows 
from 

v^(x,e) = o(\e\- 1 ) = o(\^\- 1 ), 
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since is homogeneous of degree 1 in £, together with 

\e-m\\ 2 <\t-tm\ 2 = om 2 /N) = om 

for all |£| < N, which uses the fact that the shape of Wi obeys the parabolic relationship 



Figure 1: The frequency domain is partitioned into yJ~N equiangular wedges. 

Because the residual phase &e(x, £) is of order 0(1) independently of N, we say that the 
function e 27 ™*^ x '^ is nonoscillatory. Under mild assumptions, this observation guarantees 
the existence of a low rank separated representation which decouples the variables x and 
£ and approximates the complex exponential very well. Define the e-separation rank of 
a function f(x,y) of two variables as the smallest integer r e for which there exists c n (x), 
d n (y) such that 

r e -l 

\f{x,y) - ^2 c n( x ) d n{y)\ < e. 

n=0 

Then we prove the following theorem in Section [21 

Theorem. For all < e < 1, there exist N* > and C > such that for all N > N* , 
the e-separation rank of e 2lTt ® l<yX & for x G [0, l] 2 and £ € Wi obeys 

r e < log^Ce" 1 ). (1.8) 

In Section[2]we make explicit the values of the constants N* and C by relating them, among 
other things, to the smoothness of $ and the angular span of We. We will also provide 
results in the case where N < N* , and explain why the separation rank for the amplitude 
is also under control. 

The point of the theorem is that the bound on the e-rank does not grow as a function of 
iV — in fact, the threshold condition on N indicates that the e-rank decays as ./V grows. The 
logarithmic dependence on e is the signature of what is usually called spectral accuracy. 

Note that the decomposition into frequency wedges obeying the parabolic scaling has a 
long history in mathematics. A multiscale version of this partitioning, the second dyadic 
decomposition, was introduced by Fefferman in 1973 for the study of Boclmcr-Riesz multi- 
pliers JS], and used by Seeger, Sogge and Stein in 1991 to prove a sharp L p -boundedness 
result for FIO j2H] - More recently, it also served as the basis for the construction of curvelets, 
with applications to sparsity of FIOs and related results for wave equations |29l |SJ H3 . 
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1.4 Outline of the algorithm 

The low-rank separated representation provided by the theorem above offers us a way to 
compute ()1.5|) efficiently with high accuracy. Each term in the decomposition Lf = L#f 
can be further simplified as follows: 

= Jf E e 2 ^^< a(x, fle****^ X e(Of(0 



N 



oo 



,27riV £ *(x,&K 



E 7S(^)7l(0 Xl(Of(0 



t=i 



M0xe(0f(0 



(1.9) 



i=l 



Our analysis guarantees that the sum over t can be truncated to a fixed, hopefully small 
number of terms without significant loss of precision. 

In order to carry out the final summation over t, we first need to construct the functions 
jf t (x) and t1(£)- Sections 13 . II and 13 . 21 discuss two different methods to find these functions. 
In Section 13.11 we present an elementary deterministic approach, while in Section 13.21 we 
present a randomized approach that offers better efficiency both timewise and storagewise. 
Assuming that ^yf t (x) and 7^(0 are available for all values of t and t, the computation of 
(Lf) for a given / consists of the following 4 steps: 

1. Fourier transform / by means of the FFT to get /. 

2. Choose a bound q greater than the e-rank r e . For each £ and t < q, form fu(Cj '■= 

iU0xt(0f(0- 

3. For each £ and t < q, compute get(x) := e 2m ^ 'f^O^&K by means of a nonuni- 
form FFT algorithm. 

4. Compute (L/) (x) ss i YJ £ YJ^ =1 7£ x t (x)s-«(x). 

The only step that require further discussion is the computation of g^ f We defer the details 
to Section l3~H 

It is instructive to understand why linearizing the phase is so important. If we disregard 
the error introduced by the discretization in £, we observe that ga(x) is simply 

9i,t(x) = ft, t (Vz$(x,ie)). 

The interpretation of an oscillatory integral in the Fourier domain as a diffeomorphism 
is only possible when the phase is linear in £. For each £ and t, the computation of gi t t 
which is an interpolation problem, is therefore much simpler problem than applying the 
original operator. Admittedly, diffeomorphisms do not provide accurate approximations 
to FIOs over angular wedges, but the content of our analysis in Section [21 shows that the 
computational budget to make up for the residual is safely under control. 
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1.5 Significance 



Applying nontrivial FIOs repeatedly is a daunting task that has proved to be the com- 
putational bottleneck in various inverse problems. There is serious scientific as well as 
industrial interest in speeding up FIO computations, and accordingly a lot of resources 
have been invested over the past decades in engineering better codes. 

We believe that the ideas introduced in this paper provide new directions. To explain 
and illustrate this contrast, let us consider an example from the field of reflection seismol- 
ogy: Kirchhoff migration. The problem is to produce an image of the discontinuities in 
the Earth's upper crust from seismograms, i.e., wave measurements f(t,x r ) parameterized 
by time t and receiver coordinate x r . Glossing over the details, the core of Kirchhoff mi- 
gration consists in integrating several different functions f(t,x r ) over a fixed set of curves, 
parameterized as the level lines of some traveltime function r(x,x r ) + r(x,x s ): 



where x s is for us a fixed parameter (the source coordinate). We do not expect the reader 
unfamiliar with seismic imaging to understand all the physics underlying this equation. 
Anyone interested in details may want to consult [HHIj for example. This collection of 
integrals is called a generalized Radon transform (GRT), or in the field of image processing, a 
Hough transform. (For convenience, the Appendix explains why integration along ellipses — 
a simple GRT — is a sum of two FIOs.) A useful notation for Kirchhoff migration is g(x) = 
(F*f)(x), where F* is called the imaging operator. 

The standard algorithm for applying the imaging operator is a simple quadrature of 
f(t,x r ), interpolated and integrated along each curve t = t(x, x r ) + r(x, x s ) (parameterized 
by x.) If the data f(t,x r ) oscillates at a wavelength comparable to the grid spacing 1/N, 
then an accurate quadrature on a smooth curve requires O(N) points. Since x takes on 
0(N 2 ) values, the curve integration results in a total complexity of 0(iV 3 ) for applying 
the imaging operator (which is of course better than the 0(N 4 ) complexity of the naive 
summation.) 

In reality, the true F* is only approximated by a GRT. The derivation of the expression 
for F* from the wave equation reveals that if the geometry of the optical rays is not too 
complex, F* is in fact closer to an FIO than a GRT . This is akin to the observation that 
the retarded propagator of the wave equation in 2D is not a distribution strictly supported 
on the boundary of the light cone — only its singular support is the boundary of the cone. 
How to compute the action of an operator with such a singular kernel is much less obvious. 
The direct summation along curves provides a fragile, restricted paradigm for curvilinear 
integrals, the same way the FFT provides a fragile setting for shift-invariant problems. 

The advantages of our algorithm should now be clear: very general FIOs can be handled 
with an asymptotic computational complexity which is lower than that required for GRT 
summation, i.e. (0(iV 2 ' 5 log N) vs. 0(iV 3 )), and this without making any curvilinear 
approximation. The other argument in favor of the GRT method is the typically low 
memory usage. But this equally applies to our method. We will show that the storage 
overhead (on top of storing the phase and amplitude) is negligible and scales like 0(\^N). 

We only discussed applications to reflection seismology, but there are many other areas 
where nontrivial FIOs are computed routinely, e.g. as part of solving an inverse problem. 
Examples in radar imaging, ultrasound imaging, and electron microscopy all come to mind. 
Some Hough transforms for feature detection in image processing can also be formulated as 




5(t — t(x, x r ) — t(x, x s ))f(t, x r ) dt dx. 
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FIOs. In short, the ideas presented in this paper may enable the speed up of fundamental 
computations in a variety of problem areas. 

1.6 Related work 

In the case where $>(x, £) = x-£, the operator is said to be pseudodifferential. In this simpler 
setting, it is known that separated variables expansions of the symbol a(x, £) are good 
strategies for reducing complexity. For instance, Bao and Symes [I] propose a numerical 
method based on a Fourier series expansion of the symbol in the angular variable arg £, and 
a poly homogeneous expansion in |£|, which is a particularly effective example of separation 
of variables. 

Another popular approach for compressing operators is to decompose them in a well- 
chosen, possibly adaptive basis of I? . Once a sparse representation is achieved, evaluation 
simply consists of applying a sparse matrix in the transformed domain. In the case of ID 
oscillatory integrals, this program was advocated and carried out by Bradie et al. [7] and 
Averbuch et al. pj]. In spite of these successes, the generalization to multiple dimensions 
has so far remained an open problem. We will come back to this question in Section |S1 
and in particular discuss the relationship with modern multiscale transformations such as 
curvelets |H1 El and wave atoms ^1 • 

We would also like to acknowledge the line of research related to Filon-type quadratures 
for oscillatory integrals |2H] . When the integrand is of the form g(x)e lkx with g smooth 
and k large, it is not always necessary to sample the integrand at the Nyquist rate. For 
instance, integration of a polynomial interpolant of g (Filon quadrature) provides an ac- 
curate approximation to f g(x)e lkx dx using fewer and fewer evaluations of the function g 
as k — > oo. While these ideas are important, they are not directly applicable in the case 
of FIOs. The reasons are threefold. First, we make no notable assumption on the support 
of the function to which the operator is applied, meaning that the oscillations of /(£) may 
be on the same scale as those of the exponential e 2m ®( x £) m Second the phase does not 
in general have a simple formula that would lend itself to precomputations. And third, 
Filon-type quadratures do not address the problem of simplifying computations of several 
such oscillatory integrals at once (i.e. computing a family of integrals indexed by x in the 
case of FIOs) . 

Finally, we remark that FIOs are also interesting when the canonical relation is nontrivial — 
that is, multivalued phase — because they allow to study propagation of singularities of hy- 
perbolic equations in regimes of multipathing and caustics |221 116j . To mathematicians 
taking this specialized viewpoint, the focus of this paper may appear restrictive. Our out- 
look and ambition are different. We find FIOs to be interesting mathematical objects even 
when the canonical relation is a graph and degenerates to the gradient of a phase. Our 
concern is to understand their structure from an operational standpoint and exploit it to 
design efficient numerical algorithms. In fact, we expect this paper to be the first of a 
projected series which will eventually deal with more complex setups. 

1.7 Contents 

The rest of the paper is organized as follows. Section |2] proves all the analytical estimates 
which support our methodology. In Section |*3 we describe algorithms for constructing the 
low rank separated approximation, evaluating (L/)(x), as well as for evaluating its adjoint, 
namely, computing (L*/)(x). Numerical examples in Section 0] illustrate the properties of 
our algorithms. Finally, Section [5] discusses some related work and potential alternatives. 
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2 Analytical Estimates 



In this section, we return to a description of the problem in continuous variables x and £ 
to prove estimates on the separation rank of e 2m ^ l ^ x ^\ where $g(x, £) is the residual phase 
after linearization about ^. 

2.1 Background 

We begin with a lemma which concerns the separation of the exponential function and 
whose variations play a central role in modern numerical analysis. 

Lemma 1. Consider the domain defined by x G [— A, A] for some A > 0, and y £ [—1, 1]. 
For all e > the e-rank r t of e lxy on [—A, A] x [—1, 1] obeys the bound r e < r*, where 

r* = 1 + m&x{2eA , log 2 (2e^ 1 )}. (2.1) 

Furthermore, if A < ^- then the stronger bound 

< = ! + (2-2) 
holds as well. In both cases, the corresponding separated representation is the expansion 

r*—l 

\e ixy - Y —,x n y n \ < e. 

n=0 

Proof. The proof is very simple. We start with 

r- 1 



(ixy) n 



Y- 

71=0 



n>r n>r 



It is now straightforward algebra to check that the condition 

'log(2e- 1 )/log^, if 64 < 1/2, 

max(2e4, log 2 (2e -1 )), otherwise, 



r > 



suffices to bound the right-hand-side by e. Since the e-rank r e is integer- valued, the estimate 
on r may need to be rounded up to the next integer, hence the precaution of incrementing 
the bounds in (|2~T|) and (|2~2~]) by one. □ 

In the next section we will make use of Lemma [I\to prove that the nonoscillatory factor 
e 2iri$ e (x,£) a se p ara tion rank which is independent of N. The other factor in the kernel 
a(x, £)e 27ri ® e ( x '£\ namely, the amplitude a(x,£) is in general a simpler object to study. The 
standard assumption in the literature, and also in applications, is to assume that a(x,£) is 
a smooth symbol of order zero and type (1, 0), meaning that for each pair of integers (a, ft), 
there is a positive constant C a p obeying 



\d^a( X ,o\<c aP (i+\ey 



-M/2 



For simplicity, we will also assume that a(x,£) is compactly supported in x 1 . The nice 
separation properties of a are simple consequences of its assumed smoothness. 

This assumption is equivalent to assuming that functions in the range of L are themselves compactly 
supported in situations of interest, which ought to be the case for accurate numerical computations. 
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Lemma 2. Assume a(x, £) is a symbol of order zero. Then for all M > there exists 
Cm > such that for all e > 0, the e-rank for the separation of x and £ in a(x, £) obeys 

Proof. Perform a Fourier transform of the C°°, compactly supported function a(-,£). It 
suffices to keep 0(e _1 / A/ ) Fourier modes to approximate a(-, £) to accuracy e on its compact 
support. Each Fourier mode is of the form a(co,^)e iu)X , hence separated. □ 

It goes without saying that the e-rank of the product a(x, £)e 2m ® e ( x '& is bounded by a 
constant times the product of the individual e-ranks, and we now focus on the real object 
of interest, the factor e 2nt ® 1 . 



2.2 Large iV asymptotics 

In this section we assume that the phase £) is C 3 in £, only measurable in x, and define 
C k = 2vr sup sup |fl£$(x,f)| for < k < 3, 

where 8 = arg£. These constants will enter our estimates only through the following 
combinations: 

D 2 = C + C 2 , and D 3 = d + C 3 . 

As before, we also require homogeneity of order one in £. Finally, we let the general angular 
opening of the cone We to be radians, for some constant a (the introduction section 
proposed a = tt). 

The result below is a more precise version of the theorem we introduced in Section 1. 

Theorem 1. For all < e < 1, and N > -jg^r, the e-separation rank of e 2m ^ e ^ x '^ for 
x G [0, l] 2 and £ G Wi obeys 

r e <l + max{^a 2 D 2 , log 2 (4e^)}. (2.3) 



Furthermore, if a is admissible in the sense that a < \/-rP-, then 



eD2 ' 

r,<l + l ^X. (2.4) 
" log^#- 

Proof. Put r = |£| and as the angle measured from the vector The phase $ can 
be rewritten as $(a;,£) = r<p(x,6). Let £i be the frequency coordinate along £^ and £2 
orthogonal to £1, so that we can switch between polar and Cartesian coordinates using 

a4 1 a4 2 
where the derivative of is taken in 0. The residual phase is 

= r (0(a?, 6>) - cos 00(s, 0) - sin 00' (a, 0)) . 



10 



We can now expand cp(x, 9), cos 6 and s'm6 in a Maclaurin series (around 9 = 0) to obtain 



(<f>(x,0) + <j>"(x,0)) + 



— <j> (x,0) + — 4> (x,9) 
b b 



(2.5) 



for some 8 and 9 between and 9 (with 9 depending on x.) 

The x and £ variables are separated in the first term of equation (|2.5|) . so we write 

/(x)s(O = 2tt (0( s ,O) + ^"(x,O))^-. 
The term in square brackets is the remainder, and we write 

33 



R(x,£) = 2tt 



D 



Our strategy will be to choose N large enough so that R(x,£) becomes negligible, hence 
only the exponential of the first term needs to be separated. 

Recall that in 2D the frequency domain is the square [— y , y — I] 2 - Since \9\ < ^= in 

the wedge We, and r < ^N, we have the following bounds for the two terms in equation 



|/(z) 5 (£)| < ^ 2 ^2, \R(x,0\< 



V2 



O." 



L> 3 



12 ViV " 

It is instructive to notice that the bound on \fg\ is independent of N. That is the reason 
why we chose the angular opening of the cone We proportional to N~ l l 2 (parabolic scaling). 
The first contribution to the separation remainder is given by 



Afg+R) 



Jfg\ 



iR 



< \R\ < 



V2 



a 



12 



D 3 . 



The condition on N ensures precisely that this remainder be dominated by e/2. 

The second contribution to the total error is due to the separation of e 1 ^ 9 itself, and 
needs to be made smaller than e/2 as well. We invoke Lemma ^ with f(x) x sup|g(£)| 
in place of x, <?(£)/ sup 1^(^)1 in place of y, and e/2 in place of e. With these choices, A 
becomes -^a 2 Z?2, and we obtain the desired result. □ 



2.3 Small e asymptotics 

Theorem ^ is a special asymptotic result in the case of large N (problem size) — or alter- 
natively small a (cone's angular opening). This regime may not be attained in practice so 
we need another result, without restrictions on N, and informative for arbitrarily small e. 

To this effect, we need stronger (yet still realistic) smoothness assumptions on the phase 
<E>: for each x, we require that £) be a real-analytic function of £. This condition implies 
the bound 

2vr sup \d$$(x,£)\ < Qk\R~ k , 
141=1 

for some constants Q and R. For example, R can be taken as any number smaller than the 
uniform radius of convergence in 8, in which case Q will in general depend on R. Let us 
term such phases, or functions, (Q, R)- analytic. As before, we also require homogeneity in 
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Theorem 2. Assume <J>^(x,£) is measurable in x, and (Q,R)- analytic in £, for some con- 
stants Q and R. Assume that a is admissible in the sense that 

r r^N R . 

a<mm{ ~7^ } ' 

T/ien /or aZZ < e < 1, the e-separation rank of e 2m ® l ^ x & for x S [0, l] 2 and £ G We obeys 

9 

r e < C p e p , yp:p> 



log 2 ( 



Proof. Throughout the proof, x £ [0, l] 2 and £ £ W^. Using the smoothness assumption on 
we can repeat the reasoning of the proof of Theorem ^ and obtain the convergent series 

oo 

2^ e (x,0 = J2f^)9k(0, 

k=0 

where fk( x ) = 2ir(j)^ (x,0) (the differentiations are in 0) and 



r6 k 



50 (0 =r(l-cos0), <&(£)= r(0-sin0), = — 

We denote the bound \fk{x)9k(0\ — ^fci with 

3 . /o / ™ \ fc 



a 



..„ - 4 <3<* . * = ^<3^. * = yfl* for * > 2. 

Our strategy will be to call upon Lemma ^ for the first few factors e l -^ k9k , in order to 
obtain a separation rank r^ and an error for each of them: 

e ih 9k _ £ ^/ fc »(x)(7jJ(0 < e fc . (2.6) 

n=0 

We will perform this operation for each k < K, with K large enough, to be determined. 
Once the separation of each factor is available, we can write 

K-l 

g^Efc^o 1 fk-9k — JJ & ifk9k ^ 
k=0 

and obtain the bound on the overall separation rank as the product n^o 1 r k- 
There are two sources of errors we must contend with: 

• Truncation in k. The factors e l ^ k9k for k > K, will be deemed negligible if their 
combined contribution results in an overall error smaller than e/2, meaning 

| e 27ri* £ _ giEfjoVfeSfc] < 1 (2.7) 

The left hand side is bounded by | J2k fk9k\- Using the bound we stated earlier on 
Af~, and the admissibility condition on a, a bit of algebra shows that (|2.7j) is satisfied 
for 

r log(2 v / 2QA r e- 1 ) n , . 

K = & K v , ^ q 2.8 

log(^) 

(meaning the smallest integer greater than the quotient inside the brackets). This 
quantity in turn obeys log 2 (8e~ 1 ) < K < log 2 (16e~ 1 ). 
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Truncation in n. The truncation errors from (|2.6j) must be made sufficiently small 
so that their combined contribution also results in an overall error smaller than e/2, 
meaning 



K-l 



n £ -^(^m 



e 

< -. 
~ 2 



(2.9) 



k=0 k=0 n=0 

Easy manipulations 2 show that (|2.9ft follows from the bound 



3K' 



(Recall that K is comparable to log(Ce 

Such a bound holds if, in turn, we take rjt large enough. The admissibility condition 
on a ensures, among others, that we can invoke the strong version of Lemma 
namely equation (|2.2|) . and obtain 



n < 1 + 



(2.10) 



It now remains to estimate Ilfc^ 1 r k j where r& is given by equation ()2.1U|) and K by 
equation (|2.8|) . We treat the first two factors independently: we can check from the bounds 
on Aq and A±, and the admissibility condition on a, that 



As for the case k > 2, 



r k < 1 + 



r ri < Clog [2e _1 log(2e _1 ; 
log(6iTe- 1 ) 



log 



V2 I Ra/N 
QN 



< 



log(Ce- 1 log(2e- 1 )) + A: log 



RVN 



log(C) + Hog 



A + k 
B + k' 



(k > 2) 



We only simplified notations in the last line. Notice that A > B, and that B + k > 1 when 
k > 2. We will assume without loss of generality that A and B are integers. The value of the 



2 To justify this step, put E k {x,€) = e ifk{x)3k{6 ' > and start from the identity 

A'-l K-l 

n (£ fe + Cfc ) - n e * = e ^ n ^ + 

fc=0 fc=0 j fc^j 

K-l 

= Y.^ E J 1 \{{E k +r jk e k ) 

j fc=0 

where r jk = if j < k, and r jfe = 1 if j > k. Then make use of the bound (1 + jg) K < e c/3 < e 1/3 < 3/2. 
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product Y\ r k can only increase if we replace the initial bound < k < K, by the condition 
that the bound on r k be greater than 2. So we certainly have 

A + 2 A+3 A+A 



B + 2 B + 3 B + A 

k>2:r k >2 



(2A)\/(A + r 



" {B + A)\/{B + l)V 
We can now make use of the two-sided Stirling bound 

v&"' +1 /V n ' + ^T < n! < V2^n n+1 ' 2 e- n+ ^ 



to obtain 



(2A) 2A (A + 1)-( A +V 
Te - (A + B) A + B (B + 1)-( B + 1 ) 

^ r2 2A A™ (g + l)W> 

(A + l^+^iA + B)*- 1 (A + B)^ 1 

< C2 2A . 



In turn, 



2 2A <(C e - 1 log(2e- 1 )) ] 
which concludes the proof. □ 

The lower the fractional exponent of e _1 the faster the convergence of separated expan- 
sions. Theorem |21 shows exactly which factors can make this exponent arbitrarily small: 

• large grid size N, or 

• small angular opening constant a, or 

• large radius of analyticity R of the phase in arg £ (uniformly in x). 

Observe that the rank bound decreases as N increases. 

Theorem assumes that the residual phase function &£(x,£) is (Q, i?)-analytic in £. 
The variation below follows the same path of reasoning, and is useful when <3?^(x,£) is only 
C°° in £ for f ^ 0. 

Theorem 3. Assume <&t(x,£) is C°° in £ for £ ^ 0. For any p > 0, i/iere exists two 
constants C p and CL such that for any N , the e-separation rank with e = C p N~ p is bounded 
by C' p log N. 

Proof. The structure of the proof is similar to that of Theorem [21 One only needs to keep 
the first 2p + 2 term of the series 

oo 
k=0 

in order to have e = C p N~ p for some constant C p which depends only on p and The 
product n£c> lr fc upper bounds the overall separation rank, and is less than C' p logiV for 
some constant C' p which only depends on p. □ 

In many computational problems, the mesh size iV -1 is linked directly to the desired 
accuracy e, usually in the form of a power law, e.g. e = 0(N~ P ) for some constant p. 
Therefore, Theorem is interesting for practical reasons. 
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3 Algorithm 



For notational convenience, we assume in this section that the amplitude is identically equal 
to one; that is, we focus on the so-called (discretized) Egorov operator 



(Lf)(x) 



ten 



/(£)• 



(3.1) 



Both in practice (Section |3J) and in theory (Section (2J), one can easily take care of general 
amplitude terms. 

The algorithm for computing 1)3. 1|) has two main components: 

• Preprocessing step. Given the residual phase <5^(x,£) = <&(x,£) — V^$(x,^) • £, this 
step constructs, for each wedge We, a low rank separated approximation 



< e. 



The functions {jf t (x)} and {7| t (£)}, or their compressed versions, are then stored for 
use in the next step. 

Evaluation step. Given a function /, this step computes (Lf)(x) approximately by 



(Lf)(x) 



The preprocessing step is performed only once for a fixed phase function $ The 
family of functions {^(x)} and {7l(£)} should of course be used again and again to 
compute (Lf)(x) for different inputs /. 

In Sections 13. Il and l3. 21 we propose two different approaches for constructing the families 
{jet( x )} an d {letiO}- Section l3~H describes the details of the evaluation step. Finally, 
Section T3. 51 outlines the algorithm for rapidly applying the adjoint operator. In this section, 
we calculate time and storage complexity under the assumption of large grids, i.e. that of 
Theorem |21 For other kinds of asymptotics, one may need to adjust these estimates with a 
multiplicative logiV factor, which is typically negligible. 



3.1 Preprocessing step: deterministic approach 

We first describe a deterministic approach for constructing the low rank separated expan- 
sion, based on a Taylor expansion, exactly as in the proof of Lemma[I] For each wedge We, 
the strategy consists of the following sequence of steps: 

1. Construct a low rank separated approximation of &e(x, £). This is done by truncating 
the polar coordinates Taylor expansion to the (2p + l)st term 

2p+l 
k=l 

Here p is a constant that determines the level of accuracy. 
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2. For each k construct a separated expansion of e 27riC «=( z ) ®e) k . This is done by 
truncating the Taylor expansion to the first terms 

dik— 1 
m=0 

The value of each dgk is also chosen to obtain a good accuracy. 

3. Combine the separated expansions for k = 1, . . . , 2p + 1 into one separated represen- 
tation for e 27n * f ( x, £). Simply expanding the product of the expansions obtained in the 
previous step would be sufficient for proving a theorem like those presented in Section 
12 but in practice though, the number of terms in the expansion is too large and far 
from optimal. We thus combine the product of separated expansions two-by-two with 
the compression procedure to be described next, and repeat the process until there is 
only one separated expansion left. The final expansion provides us with the required 
functions {jf t (x)} and {7^(£)}- 

The compression procedure used to combine the product of two separated expansions 
is quite standard. Suppose we only have two expansions (the subscript i is implicit) and 
write their product as 

f E fimMPLAo) ( E 

\mi=0 / \m 2 =0 

mi ,7712 

We adopt the matrix notation and introduce 

{A) x ,m — c m( x )i (-S )m,£ = c m(£)- 

The problem is to find two matrices A and B which have far fewer columns than A and B, 
and yet obeying AB* ~ AB* . This may be achieved by means of the QR factorization and 
of the SVD: 

1. Construct QR factorizations A = QaRa and B = QbRb- 

2. Compute the singular value decomposition of RaR% and truncate the singular values 
below a threshold e together with their associated left and right singular vectors, i.e. 
RaR*b ~ UmSmVm w here Sm is a truncated diagonal matrix of singular values. 

3. Set A = QaUmSm and B = QbVm- 

Suppose A is m x q and B is n x q with both m and n much larger than q. The 
computational complexity of the compression procedure is 0((m + n)q 2 ). In our setup, 
m = \X\ = N 2 , n = \W(\ = 0(N 1,5 ), and g, the rank bound, is uniformly bounded in 
iV (Theorem |2] shows that q is bounded by a small fractional power of e, independently 
of iV). Therefore, the complexity of a single compression procedure is 0(N 2 ). Since this 
needs to be carried out 2p — 1 times for each of the wedges, the overall complexity of 
the deterministic preprocessing is 0(VN x iV 2 ) = 0(N 2 - 5 ) where the constant is directly 
related to the rank bounds of Section [2 



) 



21712 



(0 



C m (x)cf n (l;). 
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Next, let us consider the storage requirement. For each wedge, the size of the final 
separated expansion is 0(iV 2 ). Since there are wedges, the total storage requirement 
is 0(iV 2 ' 5 ), which can be costly when N is large. For example, in a typical problem with 
N = 1024 and q = 20, the total storage would be about 10 GB assuming double precision is 
used. Our second approach to solve the preprocessing step addresses this issue and requires 
dramatically less storage space. 

3.2 Preprocessing step: randomized approach 

This section describes a randomized approach for computing the functions {jf t (x)} and 
{jitiO} f° r a fixed I. The method is based on the work presented in Kapur and Long [2*1*] . 
We use matrix notations and set A to be the matrix defined by 

A X £ ■= e 2 ^**(*.0 j x G x, £ e W e . (3.2) 

The matrix A is m by n with m = N 2 and n = 0(N °). Assume the prescribed error 
e is fixed, Theorem |2] tells us that there exists a low rank factorization of A with rank 
r e = 0(1) (again, by this we mean that r e is bounded by a constant independent of N, 
although not independent of e). Using this knowledge, the following randomized method 
finds an approximate factorization 

A « UT, 

where U is of size m x q, T is q x n and q = 0(1) in N. 

1. Select a set C of r columns taken from A uniformly at random, and define A-\c] to be 
the submatrix formed by these columns. In practice, a safe choice is to take r about 
three times larger than the (unknown) r £ . 

2. Compute the singular value decomposition Atp\ ~ USV* where the diagonal of S 
contains only the singular values greater than the threshold e. Since A has a separation 
rank r e = 0(1), we expect U to be of size m x q where q is about r e . 

3. Select a set R of r rows taken from A uniformly at random, and define Ar™ to be the 
submatrix formed by these rows. Similarly, let L^j be the submatrix of U containing 
the same rows. 

4. Set T = U^,A{ R ] where is the pseudo-inverse of 

5. The matrices U and T provide an approximate factorization, i.e. A ~ UT. We 
identify the columns of U with the family {7^(2:)}, and the rows of T with {7l(£)}- 

This randomized approach works well in practice although we are not able to offer a 
rigorous proof of its accuracy, and expect one to be non-trivial. We merely argue that the 
validity of this methodology hinges on the following observations: 

• First, the columns of A are highly correlated. Following the arguments in Section 
it is not difficult to show that a pair of columns with nearby values of the frequency 
index £ G Wg have a large inner product. Therefore, as we sample uniformly at 
random, we get a good coverage of the set Wg (leaving no large hole) and as a result, 
the sampled columns nearly span the space generated by the columns of A. Note 
that one could also use a deterministic regular sampling strategy; for instance, we 
could take a Cartesian subgrid as a subset of Wg. We observed that in practice, the 
probabilistic approach provides slightly better approximations. 
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• As the SVD routine is numerically stable, it allows us to extract an orthobasis of the 
column space of AhQ] m a robust way. 

• By construction, the columns of U are orthonormal. Results from random projection 
and the geometry of high-dimensional spaces imply that, as long as U does not corre- 
late with the canonical orthobasis, the columns of U\g\ are almost orthogonal as well. 
This allows us to recover the matrix T in a stable and robust fashion. 

The computational complexity of this randomized approach is quite low. The SVD 
step has a complexity of 0(mr 2 ) = 0(N 2 ), while the matrix product T = U^A^ takes 

0(nrq) = 0(A r15 ) operations. Therefore, for each £, the complexity of the randomized 
approach is 0(N 2 ). Since the same procedure needs to be carried out for all the yN 
wedges, the overall complexity is 0(iV 2 ' 5 ). 

Often we do not know the exact value of r £ . Instead of setting r conservatively to 
be an unnecessarily large number, this difficulty is addressed as follows: we begin with 
a small r, and check whether q is significantly smaller than r. If this is the case, we 
accept the factorization. Otherwise, we double r and restart the process. Geometrical 
increase guarantees that the work wasted (due to unsuccessful attempts) is bounded by the 
work of the final successful attempt. In practice, we accept the result when q < r/3, and 
this criterion seems to work well in our numerical experiments. A more conservative test 
certainly improves the reliability of the factorization but increases the running time. 

We finally examine the storage requirement. A naive approach is to store the matrices 
U and T for each wedge W^. As T is much smaller than U in size, the storage requirement 
for each wedge is roughly the size of U, which is N 2 q = 0{N 2 ). Multiplying this by the 
number of wedges gives a total storage requirement of 0(N 2 ' 5 ), which can be quite costly 
for large N as already mentioned in the last section. We propose to store the matrices VS" 1 
and instead. Both matrices only require storage of size 0(rq) = 0(1). Whenever we 

need U and T, we form the products U = A^VS" 1 and T = U^A^. Note that the 
elements of the matrices A\p\ and Aw are given explicitly by the formula (|3.2|) and there 
is of course no need to store them at all. Putting it differently, we rewrite the computed 
factorization as 

A « A [c] VS^Ufa A [R] (3.3) 

and store only the matrices VS~ 1 and U^. 

We would like to point out that such a scheme is not likely to work for the deterministic 
approach. The main reason is that the deterministic approach involves multiple compression 
procedures which make use of QR factorizations and SVD decompositions. These numerical 
linear algebra routines are quite complicated, and therefore, it would be difficult to relate 
the resulting low-rank factorization with the elements of the matrix A, which have the 
simple form (|3.2jl . 

3.3 Comparison 

Tabled compares the deterministic and the randomized approaches in view of the computa- 
tional complexity and storage requirement. The deterministic approach has the advantage 
of guaranteeing an accurate low rank separation. However, the constant in the time com- 
plexity can be quite large as for each wedge, it requires 2p compression procedures to 
combine multiple separated expansions into a single one. Moreover, since the compression 
step uses QR factorizations and SVDs, we are forced to store the final expansion, which 
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can be quite costly for large N. In practice, the randomized approach constructs a near 
optimal low rank expansion with very high probability, requires very low storage space, and 
enjoys a significantly lower constant in time complexity since it does not utilize repeated 
QR factorizations or singular value decompositions. 





time storage 


randomized 
deterministic 


0(N 2 - 5 ) (small constant) O(VN) 
0(N 2 - 5 ) (large constant) 0(N 2 - 5 ) 



Table 1: Comparison of the deterministic and randomized approaches. 



3.4 Evaluation step 

Once the families {jf t (x)} and {^(0} are available, we use the approximation 

to evaluate Lf(x). The algorithm simply carries out the evaluation step by step: 

1. Compute /, the Fourier transform of /. 

2. For each £ and t, form := 7^)^ (£)/(£)■ 

3. For each t and t, compute g u (x) := £ f e 2 ™ v f*^H/&(£). 

4. Compute (Lf)(x) ~ ^ let( x )9£t(x). 

The only step that requires attention is the third: it asks to evaluate the Fourier series 
^ e 27riV e $(s,&K^(£) at the ^2 pointg {v € *(a?,^) :xeX}. Even though X is a Carte- 
sian grid, the warped grid {Vf$(a;, : x 6 X} is no longer so. In fact, the formula for 
is a nonuniform Fourier transform of the second kind, a subject of considerable attention 
[21151 1191 1261 127] since the seminal paper of Dutt and Rokhlin J7j. We adopt the approach 
introduced in the latter paper, and following their notations, set 

• m = 4, q = 8 and b = 0.425 for 6 digits of accuracy, 

• m = 4, q = 16 and b = 0.785 for 11 digits of accuracy. 

We specify these parameter values because they impact the numerical accuracies we will 
report in the next section, and because it will help anyone interested in reproducing our 
results. 

The algorithm in generally assumes that the Fourier coefficients are supported on 
the full grid which is symmetric with respect to the origin. For each £, the support of 
fu(0 is We, which is to say that most of the values of the input on the grid O are zero. To 
speed up the nonuniform fast Fourier transform, each wedge Wi, which is close to either 
one of the diagonals, is sheared by 45 degrees so that it becomes approximately horizontal 
or vertical. Notice that 45 degree shearing of is a simple relabeling of the array. In 

addition, all wedges are then translated so that their support fits in a rectangle of smaller 
volume centered around the origin. As the nonuniform FFT asks to compute the FFT 
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of the input data (and then finds a way of interpolating the result on an untructured grid), 
we gain efficiency since the input array is now of smaller size. Mathematically, the shearing 
operation takes the form 

= Mi - 

where M is either the identity or a 45-degree shear matrix and £ c is a translation parameter. 
Thus, we organize the computations as in 

Y, e^&'&K fet(0 = Yl e 2 ™ v ^^' M ^'+^ ht{M- x {i' + e c )) 

e 

where the final summation is a nonuniform Fourier transform at points (M*) _1 V^<I ) (x, ^). 
In condensed form, the oscillatory modes of the function we wish to evaluate are centered 
around a center frequency; we factor out this frequency, interpolate the residual, and add 
the factor back in; for the same accuracy, interpolating the smoother residual requires a 
smaller computational effort. 

A two-dimensional nonuniform fast Fourier transform takes 0(N 2 log N) operations. 
This operation needs to be repeated q = 0(1) times for each one of the y/~N wedges. 
Therefore, the overall complexity is 0(iV 2 ' 5 log N). 



3.5 Evaluating the adjoint operator 

We conclude this section by presenting how to rapidly apply the adjoint Fourier integral 
operator. Begin by expanding the Fourier transform in and write 

(L/)(x) = / (/ e^^-^d^ f(y)dy. 
for x,y,£ G M?. The adjoint operator is then given by 

(£7)0*0 = J U e-^^- x ^dA f(y)dy 
= J ^ e- 2 ™*^)/(y)dyJ e 2 ™^d£ 

or equivalently as 



m)(0 = / e- 2 ™*^f(y)dy 



in the Fourier domain. Similarly, one readily checks that the adjoint of the discrete-time 
FIO is given by the formula 



where £ G O and y G X. 



N 

y 
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Now follow the same set of ideas as in Section [3.41 and decompose L* as 

m)(o = ^E^)E e ~ 2 ™* (y,c) ^) 

i y 
i y 

= J? E MO E E 

= ^EE^(oS)E e ~ 2 ^ ( ^ } ^ (t!M/(2/)) • 

£ t !/ 

The right-hand side of the last equation provides the key steps of the algorithm. 

1. For each £ and t < q, compute f u (y) := lj t (y)f(y)- 

2. For each £ and t < q, compute g et (0 ■= T, y e~ 2ni ^^ eH fet{y) 

using the nonuniform 

fast Fourier transform of the first kind, see |17l I19j for details. 

3. Compute (Ff)(£) « ^ ^ Et 

4. Finally, take an inverse 2D FFT to get (L*f)(x). 

Clearly, all the results and discussions concerning the matrix vector product Lf apply here 
as well. 

4 Numerical Results 

This section presents several numerical examples to demonstrate the effectiveness of the 
algorithms introduced in Section |31 Our implementation is in Matlab and all the computa- 
tional results we are about to report were obtained on a desktop computer with a 2.6 GHz 
CPU and 3 GB of memory. We have implemented both the deterministic and randomized 
approaches for the preprocessing step. We choose to report the timing and accuracy results 
of the randomized approach only since it requires less time and storage as shown in Section 

We first study the error of the separated approximation generated by the randomized 
preprocessing step. For x = (xi,X2) and £ = (^1,^2), set the phase function to be 

$±(x,0=x-Z± yjr\{x)il + rl{x)e 2 - (4.1) 

We show in the Appendix that the transformation, which for each x integrates / along an 
ellipse centered at x and with axes of length r±(x) and ^(x), can be cast as a sum L + + L_ 
of two FIOs given by 

(L ± f)(x) = J a ± (x,Oe 2 ™ $± ^;m, (4.2) 
and with phases obeying (|4.1j) . 
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In our numerical example, we consider the phase $_|_ and choose 

nO) = -(2 + sin(4vrxi))(2 + sin(47nr 2 )), 
9 

rzix) = \(2 + cos(4vrxi))(2 + cos(4vrx 2 )). 
9 

In each wedge We, the phase is then linearized and a low rank separated approximation 
UT of the matrix 

is computed. To estimate the approximation error, we randomly select two sets V and A of 
s rows and s columns. Put A^a to be the s by s the submatrix of A with these rows and 
columns. The separated rank approximation to A^a is then obtained by multiplying Ur 
and ?a where Ur is the submatrix of U with rows in T and Ta is that of T with columns 
in A. The error is then estimated via 

\\A TA - U r T A \\ F 
UvaWf ' 

where || • \\f stands for the Frobenius norm. In our numerical test, we set s to be 200, and 
Table |2] displays approximation errors for different combinations of problem size N and 
accuracy e. The results show that the randomized approach works quite well and that the 
estimated error is controlled well below the threshold e. 





e =le-3 e =le-4 e =le-5 e =le-6 


N = 64 
N = 128 
N = 256 
N = 512 


3.57e-04 4.93e-05 3.21e-06 5.17e-07 
3.11e-04 2.28e-05 4.19e-06 5.81e-07 
2.85e-04 2.83e-05 2.94e-06 4.13e-07 
1.66e-04 2.82e-05 4.38e-06 6.80e-07 



Table 2: Relative errors of the low rank separated representation constructed using the 
randomized approach. 

Next, consider the relationship between the separation rank and the threshold e. Corol- 
lary 01 shows that e scales like N~ p for a fixed constant p provided that the separation rank 
grows gently like plogN. In this experiment, we use the same phase function in 
(|4.1|) . and show the separation rank for different values of iV and p in Table El These results 
suggest that the separation rank is roughly proportional to both p and the logarithm of 
N, which is compatible with the theoretical estimate. Moreover, when N is fixed, the rank 
seems to grow linearly with respect to p, which possibly implies that the constant C(p) in 
Theorem 01 in fact grows linearly with respect to p. 

We now turn to the numerical evaluation of (Lf)(x), 

(£/)(z) = ^E e2 "* (X '°/(6, (4-3) 

e 

where the phase function $ is the same as in (|4,1[) . In this example, / is an array of 
independently and identically mean-zero normal random variables (Gaussian white noise), 
which in some ways is the most challenging input. The threshold e is set to be 10 iV~ 2 
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p =1 


p =1.5 


p =2 


p=2.5 


p =3 


N =64 


7 


10 


14 


18 


22 


N =128 


9 


12 


17 


21 


24 


N =256 


9 


12 


17 


21 


25 


N =512 


10 


15 


19 


24 


27 



Table 3: Ranks of the separated representation generated by the randomized approach for 
different values of N and p. The prescribed error is equal to N~ p . 

(i.e., p = 2). To estimate the error, we first pick s points {xi : i = 1, . . . , s] from X and 
put {(Lf)(xi)} for the output of our algorithm fSection |3.4|) . We then compare the values 

of (Lf)(xi) at these points with those of {(Lf)(xi)} obtained by evaluating (|4.3jl directly. 
Finally, we estimate the relative error with 

^{Lf)( Xl )-{Lf){^ 

V EJ(£/)(^)I 2 

Here, we choose s = 100, and Table 0] summarizes our findings for various values of N. The 
results show that our algorithm performs well. The error is controlled well below threshold 
and the speedup over the naive algorithm is significant for large values of N. 



(N,e) 


Preprocessing(s) 


Evaluation(s) 


Speedup 


Error 


Storage(MB) 


(64,2.44e-03) 


2.06e+00 


3.89e+00 


2.05e+00 


2.08e-03 


0.76 


(128,6.10e-04) 


1.09e+01 


2.45e+01 


6.58e+00 


8.02e-04 


1.26 


(256,1.53e-04) 


8.10e+01 


1.65e+02 


1.67e+01 


1.00e-04 


2.01 


(512,3.81e-05) 


4.67e+02 


9.88e+02 


4.46e+01 


4.22e-05 


3.06 



Table 4: Numerical evaluation of Lf{x) with / a two dimensional white-noise array. The 
second and third columns give the number of seconds spent in the preprocessing and eval- 
uation steps respectively. The fourth column shows the speedup factor over the naive 
algorithm for computing (Lf)(x) using the direct summation (|4,3j) . The fifth column is 
the estimated relative error and the last gives the amount of memory used in terms of 
megabytes. 

We have only considered the evaluation of FIOs in "Egorov" form thus far (constant 
amplitude) but the algorithm described in Section |3] can be easily extended to operate 
with general amplitudes provided that the term a(x, £) also admits a low rank separated 
representation in the variables x and £. 

To study the performance of our algorithm in the more general setup of variable ampli- 
tudes, we continue with the example where / is integrated along ellipses (|4.2j) (recall the 
phase (|4.1j) ). The Appendix shows that a possible choice for the amplitudes a±(x,£) and 
phases $±(x, £) is 

a±(x,0 = ±- ( M2tt P (x,0) ±iY (2Trp(x,0))e^ 27rip(x 't\ (4.4) 

47T 

*±(x,S)=x-£±p(x,0 (4.5) 

with 

p(x,0 = Jri(x)$+r%(x)$l 
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Here, Jo and Yq are Bessel functions of the first and second kind respectively, see the 
Appendix for details. 

For the axes lengths, set 

r\(x) = r-zix) = r(x) = — (3 + sin(47rxi))(3 + sin(47ra;2)) (4-6) 

16 

(which means that our ellipses are circles). We compute (L + /)(x) for different values of 
N and e and provide the results in Table [5J The computational analysis shows that our 
algorithm performs equally well in the variable amplitude case. For N = 512, the speedup 
factor over the naive evaluation is about 162. 



(N,s) 


Preprocessing(s) 


Evaluation(s) 


Speedup 


Error 


Storage(MB) 


(64,2.44e-03) 


2.18e+01 


3.67e+01 


4.54e+00 


7.30e-04 


0.37 


(128,6. 10e-04) 


1.09e+02 


1.65e+02 


1.49e+01 


4.00e-04 


0.59 


(256,1.53e-04) 


6.62e+02 


8.46e+02 


4.49e+01 


1.39e-04 


0.89 


(512,3.81e-05) 


3.42e+03 


4.43e+03 


1.62e+02 


3.69e-05 


1.38 



Table 5: Numerical evaluation of Lf(x) with / a two dimensional white-noise array. 

An extremely important property of Fourier integral operators is that, under the non- 
degeneracy condition 

det ( \ 

the composition of an FIO with its adjoint preserves the singularities of the input function. 
Mathematically speaking, if WF(f) is the wave front set of / |16l I30j . then 

WF(L*Lf) = WF(f). 

This property serves as the foundation for most of the current imaging techniques in re- 
flection seismology [313] • In the final example of this section, we verify this phenomenon 
numerically. We choose the phase function to be 

= x ■ £ + r(x)\Z\, 

where r(x) is given by (|4.6|) . and compute (L*Lf) using the algorithm discussed in Sections 
13.41 and 13.51 Figure |2] displays results for three input functions with different kinds of 
singularities. Looking at the picture, we see that the singularities of Lf are of course 
different than those of /, but we also see that the singularities of L*Lf coincide with those 
of/. 



5 Discussion 

5.1 About randomized algorithms 

The method used in the randomized preprocessing step was first introduced by Kapur and 
Long |24| . Lately, there has been a lot of research devoted to the development of randomized 
algorithms for generating low rank factorizations, and we would like to discuss some of this 
work. 



24 










MM 

■fl 

■ 








J' I\ 



• 


• 








• 


• 

• 

• 








• 


• • 





Figure 2: Numerical verification of the fact WF(L* Lf) = WF(f). Each row, from left to 
right, shows the magnitudes of f(x), (Lf)(x) and {L* Lf){x) . Notice that the wave front 
set of f(x) and (L*Lf)(x) are numerically as close as they can be. Remark: the images 
on the left column and on the right column are not supposed to be the same; only their 
"singularities" coincide. In other words, the adjoint L* is not the inverse of L. 
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Drineas, Kannan and Mahoney describe a randomized algorithm for computing 
a low-rank approximation to a fixed matrix. The main idea is to form a submatrix by 
selecting columns with a probability proportional to their norm. Since this work is about 
unstructured general matrices, it does not guarantee a small approximation error. As an 
example, suppose all the columns of the matrix have the same norm and one of them is 
orthogonal to the span of the other columns. Unless this column is selected, the orthogonal 
component is lost and the resulting approximation is poor. 

Our situation is different. Since each entry of our matrix 

V Jxex,£ew t 

has unitary magnitude, the uniform probability used in our algorithm is actually the same 
as that proposed above ^Sj- I n some ways then, our approach is a special case of that 
of Drineas et. al. But the point is that our matrix has a special structure. As we argued 
earlier, the columns of A are often highly correlated and we believe that this is the reason 
why the randomized subsampling performs well. 

A recent article by Martinsson, Rokhlin and Tygert HHj presents a new randomized 
solution to the same problem. The only inconvenience of this algorithm, probably inevitable 
for general matrices, is that one needs to visit all the entries of the matrix multiple times. 
This can be quite costly in our setup since there are 0(N A ) entries. This is why we adopt 
the method by Kapur and Long. 

5.2 Storage compression 

We would like to comment on the storage compression strategy discussed at the end of 
Section E21 In fact, what we described there can be viewed as a new way of compressing 
low rank matrices. 

In a general context, the entries of a matrix can be viewed as interaction coefficients 
between a set of objects indexed by the rows and another set indexed by the columns. In 
our case, the first set contains the grid points x in X, while the second set consists of the 
frequencies £ in We,. Call these two sets I and J, and the interaction matrix Aj t j. The 
standard practice for compressing Aj t j is to find two sets I' and J' of smaller sizes and 
form an approximation 

Ai,j » M IiP Mp t j,Mj ft j. 

Here /' is either a subset of I or a set which is close by in some sense, and likewise for 
J' and J. For example, in the fast multipole method of Greengard and Rokhlin |2U) . 
J' is the multipole representation at the center of the box containing J while I' is the 
local representation at the center of the box containing /. The matrices Mj p, Mji ji and 
Mji^j are implemented as the multipole-to- multipole, multipole-to- local and local-to-local 
translations. This becomes even more obvious when one considers the newly proposed 
kernel independent fast multipole method by Ying, Biros and Zorin There, I' and J' 
are the equivalent densities supported on the boxes containing / and J, while Mjji, Mp t ji 
and Mji j can be computed directly from interaction matrices and their inverses. In both 
cases, we are fortunate in the sense that prior knowledge offers us efficient ways to multiply 
Mj p, Mji^ji and Mj> j with arbitrary vectors. Whenever this is not true, one might be 
forced to store these matrices, which could be quite costly. 

What we have presented in (|3.3|) is a totally different factorization: 

Au « AjjiRjipApj. 
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Figure 3: Factorization of interaction between A and B. (a) the standard scheme, (b) the 
scheme abstracted from the storage compression method used (13 . 31) . 

Notice that since Ajji and Apj are interaction matrices themselves, there is no need to 
store them as long as we can compute the interaction coefficients easily. The only thing we 
need to keep in storage is the matrix Rjip. However, as long as the interaction is low rank, 
I 1 and J' have far fewer objects than / and J, so that Rj'p only uses very little storage. 
Finally, we would like to point out that, instead of representing the interaction from J' (a 
subset of J) to I' (a subset of I), Rjt is a reverse interaction. Figure |3] shows conceptually 
how the new factorization differs from the standard one. 

5.3 Curvelets, wave atoms and beamlets 

There might be other ways of evaluating Fourier integral operators, and we would like to 
discuss their relationships with the approach taken in this paper. 

Curvelets, proposed by Candes and Donoho ^0]; are two dimensional waveforms which 
are highly anisotropic in the fine scales. Each curvelet is identified with three numbers to 
indicate its scale, orientation and position, and the set of all curvelets form a tight frame. 
Recently, Candes and Demanet E] have shown that the curvelet representation of the 
Fourier integral operators is optimally sparse. More precisely, a Fourier integral operator 
only has 0(N 2 ) nonnegligible entries in the curvelet domain. The wave atom frame, which 
is recently introduced by Demanet and Ying ^2] > has the same property. If we were able to 
find such a representation efficiently, we would hold an 0(N 2 log N) algorithm for evaluating 
a Fourier integral operator which would operate as follows: 

1. Apply the forward curvelet transform to the input and get curvelet coefficients. 

2. Apply the sparse FIO to the curvelet coefficient sequence. 

3. Apply the inverse curvelet transform. 

Both steps 1 and 3 require at most 0(N 2 log N) operations 

Constructing the curvelet representation of FIO from the phase function 3>(x,£) effi- 
ciently has, however, proved to be nontrivial. At the moment, we are only able to construct 
an approximation which is asymptotically accurate by studying the canonical relation em- 
bedded inside the phase function <5(x,£). Such a construction would be adequate if we 
were interested in applying an FIO to input functions with only high frequency modes. 
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However, one often wants a representation which is accurate for all frequency modes, and 
we are currently not aware of any efficient method for constructing such a representation. 

Beamlets |14j were introduced by Donoho and Huo at roughly the same time as curvelets. 
Beamlets are small segments at different positions, scales and orientations. As pointed out 
in Section curvilinear integrals make up an important subclass of FIOs, and beamlets 
may offer ways to efficiently compute such simpler integrals. One might think of something 
like this: 

1. Compute the beamlet coefficient sequence of the input. 

2. For each x £ X, figure out the integration curve and approximate it with a chain of 
beamlet segments. Sum up the beamlet coefficients along the chain. 

Assuming the integration curves are twice differentiable, we would need about y/N beamlet 
segments to approximate each curve. Thus, the overall complexity of this algorithm might 
scale like 0(iV 2 ' 5 ), which is the same scaling as that of our algorithm. The problem is 
that it is unclear how one would efficiently approximate the integration curve with beamlet 
segments without sacrificing accuracy. Situations in which the input function / is highly 
oscillatory or in which the integration curves have parts with a high curvature seem very 
problematic. 

Our algorithms decompose the FIO in the frequency domain whereas the beamlet based 
approach processes data in the spatial domain. Sandwiched right in the middle, curvelets 
and wave atoms operate in the phase-space — the product of the frequency and of the spatial 
domains. We believe that operating in phase-space by exploiting the microlocal properties 
of FIOs would be important to bring down the complexity to the optimal value of about 
TV 2 operations. 

A Integration Along Ellipses 

The material in this section is probably not new, but we expand on it for the convenience of 
the nonspecialist. Consider the generalization Radon transform that consists in integrating 
f(x) along ellipses of axes lengths r\(x) and ^(x), and centered around x: 

°/w = //("+(feO)* 

We want to recast it as a sum of FIOs. Let us start by writing 

Gf(x) = j K(x,0f(0d£, 

with 

K(x, = e 2nix -t J exp 

Put p(x,£) = \fr'i{x)£i + r'i(x)£% and rewrite 

*<^>— S /e*P 

Here a 2 + j3 2 = 1, and both a and (3 depend on x and £ but the value of the integral is 
independent of their particular value. This is because any change of variables 9 — > 9+<p(x, £), 



( r l( x)cosC 
r2{x) sin a 



dO. 



2nip(x, £) 



cos 6 



sin I 



ol(x, £) 
f3(x,0 



de. 
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effectively corresponding to a rotation of the unit vector (a, /?), keeps the integral invariant. 
So we may as well take a = 1, J3 = and obtain 

/ p 2nix-t; 
e 2mp(x,0cose dQ = ^__j Q ( 2lrp ( Xi {)). 

Of course the Bessel function Jo oscillates, and we need to extract the phase from its 
asymptotic behavior 



J (27T P (X,0) 




The idea is now to express Jq{2it p(x , as a sum of two terms, each of which being the 
product between a smooth amplitude (a demodulated version of Jo or the envelope of Jo if 
you will) and the oscillatory exponential e ± 2m P( x £) . i n effect, K is decomposed as a sum 
of two FIOs: 

K(x, = a+(x, f ) e 2*«+(*»0 + a _(x, £) e 2«*- 

with 

There are different ways to choose the amplitudes. One way is to let Yq be the Bessel 
function of the second kind of order zero [Q and exploit the identity 2Jo = (Jo + iYq) + 
(Jo — iYq), which allows to write 

a±(x,0 = ^( Jo(2^(x,0)±^o(2vrp(x,0))e^^). 

Both amplitudes behave asymptotically like ■\Jl/-K 2 p{x, £) as x — > oo, which incidentally 
shows that the order of the FIO is —1/2. The logarithmic singularity of Yq near the origin 
in £ is mild and easily regularized with no loss of accuracy. 
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